220 research outputs found

    Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-format stimuli

    Get PDF
    This article presents two experiments dealing with a psychoacoustical evaluation of the pitch synchronous overlap-and-add (PSOLA) technique. This technique has been developed for modification of duration and fundamental frequency of speech and is based on simple waveform manipulations. Both experiments were aimed at deriving the sensitivity of the auditory system to the basic distortions introduced by PSOLA. In experiment I, manipulation of fundamental frequency was applied to synthetic single-formant stimuli under minimal stimulus uncertainty, level roving, and formant-frequency roving. In experiment II, the influence of the positioning of the so-called "pitch markers" was studied. Depending on the formant and fundamental frequency, experimental data could be described reasonably well by either a spectral intensity-discrimination model or a temporal model based on detecting changes in modulation of the output of a single auditory filter. Generally, the results were in line with psychoacoustical theory on the auditory processing of resolved and unresolved harmonics

    Influence of fine structure and envelope variability on gap-duration discrimination thresholds

    Get PDF
    The goal of the study was to investigate whether the temporal resolution of the auditory system is influenced by the variability of the stimulus envelope. To do so, the ability to detect an increment in the duration of a temporal gap (the test gap) was measured with an adaptive 3-IFC procedure. The stimulus consisted of a series of 10-ms broadband noise pulses. The pulses were separated by a 10-ms silent period, or temporal gap. In the main experiments, the test gap was either the first or the last gap in a series of 21 pulses. The variability in the stimulus' envelope was controlled directly by applying a jitter to the onset of the individual pulses in the pulse trains. Additionally, the stimuli were presented with different fine structure variabilities which also induced differences in the variability of the envelope. The gap-discrimination thresholds for the jittered noise pulse trains showed strong dependence on the amount of jitter as long as the jitter was applied randomly leading to a different pattern for every stimulus. When the jitter was applied as a frozen jitter resulting in a constant pattern of pulses, the thresholds did not increase significantly. A similar result was obtained for the different fine structure variabilities. A frozen fine structure led to thresholds about 1 ms lower than those obtained with random noise stimuli. A measure for the envelope variability was provided by calculating the variances of the envelope spectrum of the gammatone-filtered stimuli. The results of the calculations show a qualitative correspondence to the experimental results

    Modelling modulation perception : modulation low-pass filter or modulation filter bank?

    Get PDF
    In current models of modulation perception, the stimuli are first filtered and nonlinearly transformed (mostly half-wave rectified). In order to model the low-pass characteristic of measured modulation transfer functions, the next stage in the models is a first-order low-pass filter with a typical cutoff frequency of 50 to 60 Hz. From physiological studies in mammals it is known that many neurons in, e.g., the inferior colliculus, show a bandpass characteristic in their sensitivity to amplitude modulation. Results from psychophysical studies of modulation masking also suggest some kind of bandpass analysis of modulation frequencies. Results of two experiments on modulation detection that allow discrimination between models incorporating a low-pass filter and those using a modulation filterbank are presented. In the first experiment, modulation detection thresholds were measured for noise carriers of bandwidths between 3 and 6000 Hz. In the second experiment, modulation detection for a sinusoidal carrier was measured in the presence of interfering modulation components with a bandpass characteristic in the modulation spectrum. The results from these experiments could not be simulated by a model including a modulation low-pass filter, but were successfully simulated by a model using a modulation filterbank

    Discriminality of statistically independent Gaussian noise tokens and random tone-burst complexes

    Get PDF
    Hanna (1984) has shown that noise tokens with a duration of 400 ms are harder to discriminate than noise tokens of 100 ms. This is remarkable because a 400-ms stimulus potentially contains four times as much information for judging dissimilarity than the 100-ms stimulus. Apparently, the ability to use all information in a stimulus is impaired by some kind of limitation, e.g. a memory limitation (cf. Cowan 2000) or a limitation in the ability to allocate attentional resources (cf. Kidd and Watson 1992). In a first experiment, this study examined the influence of stimulus duration and bandwidth of Gaussian noise tokens on the ability to perform an auditory discrimination task. In a second experiment, the amount of potential information in a stimulus was decoupled from its duration in order to more carefully examine the properties of the memory or attention limitation that results in the discrimination impairment. Finally, a computational model that limits the amount of perceptual information is introduced as an attempt to model the findings of the first and second experiment

    A probabilistic model for robust acoustic localization based on an auditory front-end

    Get PDF
    Although extensive research has been done in the field of localization, the degrading effect of reverberation and the presence of multiple sources on localization performance has remained a major issue. The classical approach to localize an acoustic source in the horizontal space is to search for the main peak in the cross-correlation function, which corresponds to the interaural time difference (ITD) between both ears. Apart from ITD, the interaural level difference (ILD) can contribute to localization, especially at higher frequencies where the wavelength becomes smaller than the diameter of the head, leading to ambiguous ITD information. Motivated by the robust localization performance of the human auditory system, its peripheral stage is used as a front-end for binaural cue extraction. The interdependency of ITD and ILD on azimuth is a complex pattern that depends also on the room acoustics and is therefore learned by azimuth-dependent Gaussian mixture models. Multiconditional training is performed to incorporate the spread of the binaural features caused by multiple sources and the effect of reverberation. The trained localization model outperforms state-of-the-art localization techniques in simulated adverse acoustic conditions. Furthermore, the model is capable of generalizing to changes in the simulated room absorption and to unknown source/receiver combinations

    Model simulations of masked thresholds for tones in dichotic noise maskers (A)

    Get PDF
    The study of masked thresholds in dichotic noise maskers is important for understanding the processing in binaural hearing. To simulate these thresholds a psychoacoustically motivated perception model was used [T. Dau et al. (1995). ``A quantitative model of the ``effective'' signal processing in the auditory system: I. Model structure,'' submitted to J. Acoust. Soc. Am.]. This model, which has been successfully applied to several monaural psychoacoustical experiments, was extended by an additional binaural processing unit. It consists of a filterbank, half-wave rectifier, low-pass filter, and adaptation loops, which model the temporal processing. The binaural processing unit detects the interaural correlation and makes decisions based on the difference between the signals from both ears. Masked thresholds in the NoS and NSo configurations, obtained as a function of noise masker frequency and bandwidth, were simulated and compared to new experimental measurements. The dependence on interaural delay and interaural decorrelation of the noise masker was also modeled and compared to data in the literature. In general, model simulations agree well with the main features seen in the measurements. [Work supported by DFG (Ho 1627/1-1) and by NIDCD (Grant DC00100).

    Parametric coding of stereo audio

    Get PDF
    Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation

    Binaural detection with spectrally nonoverlapping signals and maskers: evidence for masking by aural distortion products

    Get PDF
    Thresholds were measured for diotic tonal signals in the presence of interaurally delayed bands of Gaussian noise. When the signal frequency was 525 Hz, the spectrum of the noise was either below (highest frequency, 450 Hz) or above (lowest frequency, 600 Hz) the frequency of the signal. When the signal frequency was 450 Hz, the spectrum of the noise was always above the signal frequency (lowest frequency, 600 Hz). Signals had a 250-ms duration and were temporally centered within the 300-ms long bursts of noise. The spectrum level of the noise was 60 dB. Thresholds obtained in all three conditions varied essentially sinusoidally with the interaural delay of the noise. For signals below the spectrum of the noise, the periodicities within the data were close to, but not identical with, the periodicities of the signals. This outcome is discussed in terms of masking produced by aural distortion products stemming from interactions within the bands of noise [cf. van der Heijden and Kohlrausch, J. Acoust. Soc. Am. 98, 3125–3134 (1995)]. For signals above the spectrum of the noise, the periodicities in the data suggested that masking was produced by components within the band of noise. Patterns within the data are also discussed in terms of limitations concerning the magnitude of external delays that can be matched by internal delays that are incorporated in modern models of binaural processing
    • …
    corecore